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Abstract 


In  2004,  the  White  House  and  then  Congress 
determined  there  should  be  an  “Information 
Sharing  Environment”  that  facilitates  the  flow  of 
critical  information  for  counterterrorism,  related 
law  enforcement,  and  disaster  management 
activities.  That  work  has  been  progressing  but  a 
major  challenge  is  how  to  create  technologies 
that:  ensure  compliance  with  laws  and  policies  of 
the  federal  government,  fifty  states,  and 
individual  agencies;  convey  appropriate  data 
that  would  support  access  control  and  privilege 
decisions  in  different  jurisdictions;  and  achieve 
accountability  and  transparency  for  this  activity. 
We  have  built  a  prototype  of  Fusion  Center 
information  sharing  that  shows  significant 
progress  in  the  representation  of  law  in  a  policy 
language,  the  reasoning  of  that  law  over  data 
transactions  occurring  in  a  web  environment 
(internet  or  intranet),  acquiring  necessary 
information  from  authoritative  sources  wherever 
they  reside  in  the  decentralized  environment,  and 
providing  both  a  binary  response  suitable  for 
automated  workflow  implementation  and  a 
detailed  justification  suitable  for  human 
validation  of  the  conclusion.  In  this  paper,  we 
briefly  describe  the  technologies  employed  for 
serializing  the  data  and  policy,  reasoning  over 
the  rules  contained  in  the  policy,  and  displaying 
the  results  to  users.  These  combine  to  provide  a 
powerful  tool  supporting  a  range  of  necessary 
governmental  functions  including  access  control, 
privilege  management,  audit,  periodic  reporting, 
and  risk  modeling. 

INTRODUCTION 


After  9/11,  a  cry  arose  within  the  United 
States  that  the  terrorist  attack  could  have 
been  averted  if  government  agencies  had 
shared  what  they  knew  with  each  other. 
While  the  accuracy  of  that  claim  remains  in 
debate,  there  is  significant  evidence  that 


agencies  were  sharing  less  than  expected  and 
that  they  would  operate  more  effectively  if 
they  shared  more  information.  Three  years 
later,  having  not  made  significant  progress 
towards  that  goal,  the  White  House  issued  an 
Executive  Order  mandating  the  creation  of  an 
Information  Sharing  Environment;  this  goal 
was  reinforced  by  Congress  later  the  same 
year  when  it  was  mandated  in  a  new  statute.  1 
In  the  years  since  the  goal  was  set,  an 
impediment  to  implementation  has  been 
identified.  The  sharing  is  mandated  to  be 
performed  “[t]o  the  maximum  extent 
consistent  with  applicable  law.”  However,  a 
gap  exists  between  the  laws  and  policies 
enacted  by  government  to  regulate  the 
handling  of  information  and  the  ability  to 
enforce  those  policies  in  computer  systems. 
There  is  a  strong  need  to  bridge  that  gap  as 
more  data  is  or  is  desired  to  be  collected, 
shared,  and  manipulated.  Responsible 
managers  and  interested  citizens  alike  are 
seeking  the  means  to  ensure  that  systems 
more  effectively  implement  rules  about 
privacy,  security,  and  the  appropriate 
conduct  of  government  business.  But,  while 
people  can  express  rules  with  complex 
reasoning,  context,  and  reference  to 
information  not  contained  in  the  subject 
data,  information  systems  historically  have 
not  been  able  to  process  policies  written  this 
way. 

Eor  example,  consider  the  following 
snippet  of  legislation  enacted  by  the  state  of 
Maryland: 

A.  Subject  to  the  provisions  of  Regulation  .12B, 
the  Central  Repository  and  other  criminal 
justice  agencies  shall  disseminate  CHRI,  be  it 
conviction  or  nonconviction  criminal  history 
record  information,  to  a  criminal  justice 
agency  upon  a  request  made  in  accordance 
with  applicable  regulations  adopted  by  the 
Secretary.  A  criminal  justice  agency  may 
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request  this  information  from  the  Central 
Repository  or  another  criminal  justice  agency 
only  if  it  has  a  need  for  the  information: 

(1)  In  the  performance  of  its  function  as  a 
criminal  justice  agency;  or 

(2)  For  the  purpose  of  hiring  or  retaining  its 
own  employees  and  agents.^ 

It  is  clear  that  the  intent  of  this  legislation  is 
to  regulate  the  transmission  of  sensitive 
criminal  history  record  information  so  that  it 
is  only  used  for  appropriate  purposes. 
However,  the  interactions  between  this 
specific  policy  and  other  policies  at  the 
organization,  state,  and  federal  level  could 
potentially  be  very  complex,  and  it  is  not 
feasible  for  humans  to  reason  over  all  of  them 
simultaneously.  In  addition,  the  rules  and 
terms  used  in  policies  often  reference  other 
policies  and  pieces  of  information  located  in 
different  databases  or  organizations,  which 
makes  it  difficult  to  efficiently  verify 
compliance  by  hand.  Finally,  given  the 
number  of  transactions  that  happen  per  day, 
if  a  violation  does  occur,  it  is  difficult  to  verily 
exactly  which  information  sharing 
transaction  was  non-compliant  with  the 
applicable  policies. 

Given  that  computers  are  already 
ubiquitous  in  data  sharing  environments  due 
to  the  ease  of  sharing  and  aggregating 
information,  it  is  worthwhile  to  investigate 
whether  or  not  they  can  also  solve  the 
problems  listed  above.  We  built  a  prototype 
of  an  “accountable  system”  to  address  this 
challenge  by  using  semantic  web  technology. 
Semantic  web  technology  generally  seeks  to 
express  data  on  the  internet  in  a  way  such 
that  machines  can  reason  over  the  semantics 
of  the  data  more  readily.  An  accountable 
system  is  one  that  both  knows  which  policies 
apply  to  which  data  (policy  awareness),  and 
one  that  can  reason  over  complex  policy  and 
the  details  of  data  transactions.  These  two 
functions  together  allow  organizations  to 
fulfill  their  obligations  in  a  transparent  and 
policy- aware  manner.  In  this  project,  we 
modeled  transactions  between  Fusion 
Centers,  locations  where  state  and  federal 
agencies  work  cooperatively  to  address 
terrorism,  crime,  and  emergency  response. 
Our  prototype  shows  that  the  authoritative 
sources  of  information  needed  to  make 
policy-based  decisions  can  remain  and  be 


accessed  wherever  they  reside  in  the 
decentralized  environment. 

This  paper  presents  a  prototype  system 
that  models  the  data  sharing  workflow  in  a 
Fusion  Center  environment,  with  the 
following  features: 

1.  An  effective  way  to  represent  real 
legislation  and  policies  in  a  computer- 
readable  language  that  can  be  reasoned 
upon. 

2.  A  model  where  existing  data  can  remain 
in  disparate  databases  and  servers  which 
the  reasoner  can  access  on  the  fly  during 
reasoning. 

3.  A  reasoner  which  can  analyze 
transactions  with  rules  and  then  present 
a  justification  of  why  the  transaction  is 
or  is  not  compliant. 

4.  A  user  interface  which  analysts  and  law 
enforcement  can  use  to  determine 
whether  transactions  are  compliant  with 
the  applicable  policies.  The  user 
interface  is  designed  for  end  users  who 
have  neither  a  legal  nor  a  technical 
background,  presenting  justifications  in 
natural  language  to  users. 

The  prototype  demonstrates  that  such  a 
reasoning  system  can  be  used  to  increase  the 
amount  of  transparency  and  accountability  in 
real  data-sharing  environments.  Given  any 
data  sharing  event,  the  reasoner  can  produce 
a  transcript  that  shows  exactly  which  pieces 
of  data  went  into  the  decision,  which  parts  of 
the  law  are  relevant,  and  the  apparent 
compliance  or  non-compliance  with  those 
parts. 

Background 


Semantic  Web  and  Linked  Data 

The  primary  motivation  of  the  semantic  web 
is  that  by  associating  metadata  with  data  on 
the  web,  it  enables  computers  to  do  more 
valuable  computations  than  if  computers  did 
not  know  about  the  semantics  of  the  data  at 
hand.  In  particular,  websites  today  are 
designed  primarily  for  user  consumption,  in 
that  machines  have  a  hard  time 
understanding  the  semantic  content  on  any 
given  page.  If  the  pages  also  provide 
machine-readable  metadata,  automated 
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agents  can  more  easily  perform  tasks  on 
behalf  of  the  user. 

Linked  Data  is  the  notion  that  by 
associating  a  unique  identifier  (a  URI)  with 
each  piece  of  data  in  question,  it  is  possible  to 
create  unambiguous  references  between 
pieces  of  data.  This  ability  to  create 
relationships  between  disparate  datasets 
greatly  increases  the  utility  of  the  data,  and 
allows  computers  to  reason  over  the 
relationships  between  data.  In  addition,  it's 
no  longer  necessary  to  warehouse  data  in  one 
centralized  location,  as  data  in  one  database 
can  refer  to  data  in  another  database  by  URI 
just  as  easily  as  it  can  refer  to  data  in  the 
same  database.  More  details  about  Linked 
Data  can  be  found  in  the  work  of  Bizer,  et  al.  3 

There  has  been  much  existing  work  in 
developing  the  technologies  that  enable  the 
semantic  web.  The  Resource  Description 
Framework  (RDF)  is  a  model  of  data  that 
provides  a  way  to  describe  the  relationship 
between  resources.  ^  RDF  allows  for  the 
expression  of  triples  in  the  form  of  a  subject, 
a  predicate,  and  an  object.  Once  every 
resource  we  want  to  talk  about  (actors, 
documents,  transactions,  policies,  etc.)  has 
been  associated  with  a  URI,  it  is  possible  to 
use  RDF  to  describe  the  relationship  between 
these  resources  (e.g.  a  subject  “transaction”,  a 
predicate  “compliant  with”,  and  an  object 
“Federal  Privacy  Act”).  In  addition  to 
providing  a  way  to  talk  about  the 
relationships  between  data,  we  also  need  a 
way  to  describe  the  hierarchy  of  objects  and 
how  they  relate  to  each  other.  We  do  this 
through  the  Web  Ontology  Language  (OWL).5 
OWL  allows  each  organization  to  specify  the 
terms  that  they  are  using  by  way  of  an 
ontology,  and  each  organization  can  also 
specify  the  ways  entities  are  related  to  each 
other  (e.g.  a  police  officer  is  a  sworn  law 
enforcement).  In  addition,  OWL  lets  us 
reason  between  the  objects  in  two  different 
organizations  without  implicitly  assuming 
that  organizations  agree  on  the  terminology 
being  used.  For  example,  our  system  won't 
assume  that  a  Maryland  police  officer  is 
interchangeable  with  a  Massachusetts  police 
officer  unless  that  relationship  is  made 
explicit. 

These  notions  are  particularly  important 
for  the  applications  we're  exploring,  in  that 
the  fundamental  problem  we're  dealing  with 


is  data  being  sent  between  organizations  with 
different  personnel  and  different  information 
systems.  If  users,  data,  and  policies  can  all  be 
referred  to  in  the  same  language  by  all 
organizations  in  the  system,  it's  not  necessary 
to  also  warehouse  the  data  in  the  same  place 
to  reason  over  it.  In  our  system,  we  are  able 
to  assign  a  URI  to  each  resource  we  wanted  to 
talk  about,  so  it's  possible  for  each 
organization  in  our  simulation  to  keep  their 
data  on  separate  servers.  However,  systems 
located  at  each  organization  are  still  able  to 
dereference  data  on  other  organizations' 
systems,  and  reason  over  data  and  personnel 
from  those  organizations.  This  decentralized 
design  does  not  require  a  central  agency  to 
watch  over  all  transactions  to  ensure 
compliance  with  policy;  it's  possible  for  each 
organization  to  ensure  that  the  transactions 
they  engage  in  are  compliant  with  the  policies 
that  are  relevant  to  them.  In  addition,  since 
there  is  a  way  for  organizations  to  describe 
the  way  they  store  data  and  the  policies  that 
are  relevant  to  them,  it's  possible  to  describe 
the  nuances  of  each  organization  and  their 
data  in  the  data  itself. 

Goals  of  Accountable  Systems 

Accountable  systems  are  an  alternate  way  to 
consider  privacy  and  security  in  computer 
systems.  Almost  all  existing  systems  consider 
data  security  to  be  the  problem  of 
safeguarding  private  information  within 
certain  predefined  boundaries.  However, 
private  data  can  often  be  used  in  certain 
contexts,  but  use  of  that  data  in  other 
contexts  can  often  be  noncompliant  with 
policy.  Thus,  it  is  worthwhile  to  design 
policies  and  technology  that  emphasize 
accountability  rather  than  impenetrability. 
Rather  than  limiting  our  focus  to  preventing 
breaches  of  private  data,  we  should  design 
systems  that  are  aware  of  appropriate  use 
and  data  provenance,  so  that  once  a  breach 
occurs,  it  is  easier  to  determine  the  source  of 
the  problem  and  deal  with  the  data  release 
after  the  fact. 

Specifically,  in  this  case,  we  want  to  use 
the  ideas  of  accountable  systems  to  give 
governments  increased  confidence  that  they 
can  audit  policy-compliant  data  sharing.  For 
example,  if  two  parties  share  data  about  a 
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person,  then  a  manager,  an  inspector,  or  a 
Court  should  be  able  to  review  why  the 
system  concluded  that  the  data  sharing  event 
was  compliant  under  the  policies  governing 
the  transaction.  Instead  of  relying  on  a  “black 
box”  giving  a  binary  assertion  about  the 
validity  of  the  transaction,  it  should  be 
possible  to  show  exactly  why  the  transaction 
was  considered  valid  under  the  law. 
Similarly,  if  a  non-compliant  transaction  is 
identified,  it  should  be  possible  to  pinpoint 
exactly  what  part  of  the  transaction  is 
questionable,  and  resolve  the  matter 
accordingly.^ 

Workflow  Overview 


The  implemented  system  can  be  queried  with 
hypothetical  situations,  where  a  user  asks  if  a 
document  can  be  sent  between  two  parties.  It 
is  assumed  that  both  users  have  profiles 
detailing  their  various  affiliations  and  other 
relevant  information.  Such  infrastructure 
already  exists  in  almost  every  organization,  in 
the  form  of  databases  of  personnel 
information.  In  addition,  the  document  is 
assumed  to  be  annotated  with  information 
that  describes  the  content  of  the  document. 
The  technology  to  embed  machine-readable 
metadata  into  document  files  is  already 
prevalent  in  commercial  document  editors.  It 
is  also  assumed  that  there  is  a  transcription 
of  the  law  into  computer-readable  policy. 
Such  transcriptions  can  be  done  by  a  policy 
author,  and  only  needs  to  be  done  once  per 
policy  that  needs  to  be  reasoned  over. 

The  user  gives  URIs  (Uniform  Resource 
Identifiers)  for  each  of  these  components  to 
the  system  through  a  web  interface.  The 
system  then  displays  a  justification  of 
whether  or  not  the  hypothetical  transaction  is 
valid.  For  our  protot}^e,  using  hypotheticals 
modeled  on  real  world  scenarios,  a  user  can 
see  exactly  which  pieces  of  the  transaction  fit 
together  with  which  clauses  of  the  policy  to 
cause  the  compliant  or  non-compliant  result. 

Components  of 
A  Data  Transaction 

Rules 

The  rules  governing  a  data  transaction  are 
whatever  policies  are  applicable  to  the 


particular  data,  actors,  actions,  and  context. 
It  is  often  the  case  that  many  different  rules 
from  different  domains  apply  simultaneously 
to  a  given  transaction.  For  example,  if  one 
shares  data  between  two  different  states,  data 
protection  laws  from  both  states  need  to  be 
applied  to  the  transaction  -  at  a  minimum, 
the  law  regulating  what  a  sender  may  release 
and  the  law  regulating  what  a  receiver  may 
view  or  store.  These  policies  may  use 
different  vocabularies  to  describe  the 
transaction  and  may  use  completely  different 
data  sources  to  reach  a  sound  justification. 

Each  policy  has  its  ovm  notion  of  how 
terms  are  defined  and  related.  Thus,  each 
policy  has  to  also  include  an  ontology  of 
terms,  both  so  that  it  can  reason  about  how 
terms  relate  to  each  other  (Is  a  “police 
officer”  a  member  of  “sworn  law 
enforcement”?),  and  so  that  two  policies' 
meanings  do  not  become  conflated  during 
reasoning. 

Data 

Historical  approaches  for  applying  rules  to 
data  focus  on  categorizing  the  data  in 
question.  For  example,  in  the  government 
arena,  rules  will  be  applied  broadly  across 
categories  such  as  criminal  case 
investigations  or  sub-categories  describing 
the  type  of  investigation:  e.g.,  drugs, 
kidnapping,  tax  fraud.  However,  the 
historical  approach  falls  short  of  what  is 
present  in  the  law  today,  and  a  different 
approach  is  necessary.  Consider  this  one 
segment  of  a  sentence  in  the  Massachusetts 
law: 

Information  shall  be  provided  or  made 
available...  only  if  the  individual  named  in  the 
request  or  summary  has  been  convicted  of  a 
crime  punishable  by  imprisonment  for  a  term 
of  five  years  or  more,  or  has  been  convicted  of 
any  crime  and  sentenced  to  any  term  of 
imprisonment,  and  at  the  time  of  the  request: 
is  serving  a  sentence  of  probation  or 
incarceration,  or  is  under  the  custody  of  the 
parole  board...  7 

This  example  requires  a  system 
implementing  the  rule  to  know  not  only  that 
the  general  class  of  data  being  acted  upon 
falls  into  the  broad  class  of  criminal  record, 
but  it  also  requires  the  ability  to  represent 
information  from  within  the  data  itself  such 
as:  the  name  of  the  criminal  subject,  the 
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specific  statute(s)  under  which  convicted,  the 
length  of  the  sentence  imposed,  and  the 
current  status  of  the  convict. 

Entities  Described  in  the  Data 

This  rule  requires  the  system  to  be  able  to 
identify  at  least  three  different  kinds  of 
people  -  people  who  provide  information, 
people  who  are  the  recipients  of  information, 
and  people  who  are  the  subjects  of  the 
information.  Many  systems  can  handle  the 
first  two  as  system  users  (discussed  more 
below),  but  have  no  mechanism  to  easily 
communicate  the  details  of  a  person  within 
target  data.  Our  system  uses  semantic  web 
techniques  to  represent  these  properties.  For 
example,  the  data  subject  could  be  identified 
as  follows: 

rdf:about=http://dig.csail.mit.edu/20io/ 
DHS-fusion/MD/CHRI/Guv  Robert  B#rbg 

which  tells  the  system  the  URI  that  identifies 
Robert  B.  Guy,  the  person  in  the  data. 

Rule  in  the  Data 

Establishing  that  the  person  in  the  data  “has 
been  convicted  of  a  crime  punishable  by 
imprisonment  for  a  term  of  five  years  or 
more”  is  done  by  including  a  tag  to  indicate 
the  conviction  and  the  URI  for  the  relevant 
statute: 

<  mdccl :  convicted_pursuant 
rdf:resource="http://Iaw.iustia.com/ 
maryland/codes/gps/ii-ii4.htmr7> 

and  by  including  a  second  tag  for  the 
maximum  allowable  sentence  under  that 
statute  and  the  value  itself: 

<  mdccl:  maximum_allowable_sentence_len 
gth>20</ 

mdccl:maximum_allowable_sentence_lengt 

h> 

Temporal  Reasoning 

Another  determinative  fact  about  the  data 
may  require  the  ability  to  perform  date 
calculations.  Sharing  of  information  is 
permitted  if  “at  the  time  of  the  request:  is 
serving  a  sentence  of  probation  or 
incarceration.”  We  represented  this  as  an 
instance  of  the  subclass  which  represents 
custody  status: 

<mdccl:has_custody_status 

rdf:resource="mdccl:Parole"/> 


Also  included  in  the  metadata  is  the  sentence 
imposed: 

<  mdccl:  sentence_imposed>  5  <  / 

mdccl :  sentence_imposed  > 

With  these  two  pieces  of  information,  the 
system  has  the  ability  to  calculate  the  end  of 
the  sentence  based  on  the  date  it  was 
imposed  and  compare  that  to  the  “current” 
date  -  the  date  of  the  request  for  a  data 
transaction. 

Actors 

Real  rules  require  the  ability  to  represent 
details  about  users  of  the  system  at  a  fine 
granularity.  Again,  semantic  web  technology 
is  well  suited  to  this  purpose  because  it  is 
possible  to  represent  any  fact  about  a  user  - 
from  the  more  traditional  static  values  of 
name,  organization,  and  role,  to  the 
discoverable  or  computable  ones  such  as  a 
person's  security  keys  or  the  privileges  that 
an  actor  has  within  a  system.  Frequently, 
rules  about  data  handling  are  dependent 
upon  what  the  individual  is  doing  at  that 
moment.  For  example,  the  Maryland  law 
allows  access  to  information  if  and  only  of  it 
is  used  in  the  following  two  ways:  ® 

1.  In  the  performance  of  its  function  as  a 
criminal  justice  agency;  or 

2.  For  the  purpose  of  hiring  or  retaining  its 
own  employees  and  agents. 

This  sort  of  information  may  not  be  inferable 
from  within  a  system  and  may  need  to  be 
collected  as  an  assertion  from  the  user. 

Actions 

In  order  to  have  meaning,  a  data  usage  rule 
must  in  some  way  reference  what  action  is 
being  taken  vis-a-vis  the  target  data;  it  must 
say  an  actor  can  or  cannot  do  something  with 
particular  data.  These  rules  refer  to  actions 
such  as  “collect,”  “retain,”  “copy,”  “share,” 
and  “delete.”  Often  the  action  is  described 
using  common  words,  such  as  “disseminate” 
or  “share,”  without  any  definition  -  for 
example,  without  specifying  whether  these 
terms  apply  equally  to  making  data  available 
through  push  or  pull.  Because  we  were 
modeling  the  information  sharing 
environment,  we  focused  on  sharing  rules  for 
the  prototype,  but  could  readily  represent 
other  actions. 
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Modeling  the 
Component  Parts  in  RDF 

Rules  (AIR) 

The  rules  in  our  prototype  are  represented  in 
the  AIR  (Accountability  in  RDF)  policy 
language,  as  described  by  Kagal,  et  al.  ^  AIR 
permits  the  expression  of  policies  as  a  series 
of  patterns  representing  criteria  to  be  met  for 
compliance  with  a  particular  rule;  this  works 
well  with  legal  rules  which  often  are  referred 
to  as  having  “elements”,  such  as  the  five  fair 
uses  of  copyright.  For  the  prototype  to  be 
accessible  for  evaluation  and  validation  by  a 
broad  array  of  interested  parties  (e.g., 
government  executives,  policy  leaders, 
lawyers,  and  the  professionals  who  need  to 
share  the  information),  the  sub-rules  are 
coded  in  the  order  in  which  they  appear  in 
statute  and  annotated  with  their  legal 
citations.  This  is  particularly  challenging 
because  law  is  generated  through  negotiation 
and  does  not  generally  follow  formal  logic 
structures. 

We  know  that  some  organizations  will 
have  the  resources  and  interest  to  create  their 
own  representation  of  every  rule,  but  that 
many  will  opt  for  a  baseline  available  from  a 
rules  library;  even  in  the  latter  case,  there  will 
be  law,  legal  counsel  opinion,  or  policy  that  is 
unique  to  an  organization.  For  this  reason, 
and  to  demonstrate  operation  in  a 
decentralized  environment,  we  modeled  each 
organization  having  a  rules  library 
somewhere  within  the  organization's 
network. 


Figure  1 :  Overview  of  the  Fusion  Center  System 


Actors  (FOAF) 

The  set  of  attributes  in  a  user  profile  is 
normally  quite  limited  in  organization 
databases.  We  wanted  to  be  able  to  express 
essentially  anything  that  might  come  up  in  a 
rule  and  so  chose  to  adapt  FOAF  (Friend  of  a 
Friend)  profiles  to  represent  actors.  The 
FOAF  ontology  is  a  relatively  short  list  of 
attributes,  but  it  is  possible  to  add  an 
unlimited  number  of  additional  attributes  so 
long  as  they  are  given  a  URI  and,  preferably, 
associated  with  a  definition  in  a  supplemental 
ontology. 

Architecturally,  we  assumed  that  each 
organization  would  continue  to  control  the 
user  profiles  of  its  employees,  members,  etc. 
We  did  not  build,  but  assumed  that  each 
organization  would  ultimately  add,  a  security 
layer  which  determines  how  much  of  a  profile 
to  reveal  to  a  requesting  system.  For  example, 
if  a  foreign  organization  asks  for  specific  user 
details  in  order  to  reason  over  them,  there 
should  be  a  system  that  determines  which 
attributes  are  revealable  and  which  are 
private. 

Data  (PDF/XMP) 

Data  can  be  retained  in  many  forms, 
including  email,  text  documents,  databases, 
and  spreadsheets.  Since  there  is  already 
support  in  commercial  software,  such  as 
Adobe  Acrobat,  to  annotate  documents  with 
RDF,  we  were  not  concerned  with  modeling 
each  structured  and  unstructured  form.  For 
our  prototype,  we  modeled  a  series  of  three 
memos  -  a  request  for  information  about  a 
possible  criminal  suspect  and  responses  -  all 
in  PDF  with  RDF  in  an  embedded  XMP  file. 

Reasoning  Over  the  Transaction 


Reasoner 

A  transaction  is  evaluated  against  the  policy 
by  a  forward  chaining  reasoner,  known  as 
cwm.  Because  of  this  design  choice,  the 
reasoner  itself  cannot  issue  calls  for  more 
information.  Pre-processing  must  deliver  all 
the  necessary  data  to  the  reasoner.  For 
example,  the  prototype  automatically 
identifies  to  the  reasoner  the  URL  for  the 
sender's  profile,  the  proposed  recipient's 
profile,  and  the  target  data;  it  also  pre- 
processes  by  crawling  those  files  for 
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references  to  other  policies  or  ontologies  and 
delivers  those  URLs  to  the  reasoner  as  well. 
In  addition,  as  alluded  to  earlier,  the  system 
searches  rules  for  any  assertions  that  it  will 
need,  queries  the  user,  and  delivers  the  result 
to  the  reasoner. 

TMS 

The  reasoner  has  incorporated  a  Truth 
Maintenance  System, a  dependency 
tracking  mechanism.  This  allows  the  system 
to  retain  the  dependencies  upon  which  it 
relied  to  form  its  conclusions.  For  our 
prototype,  this  is  extremely  useful  because  it 
allows  users  to  see  the  basis  for  a  decision,  a 
function  not  available  from  some  other  policy 
reasoners.  Also,  it  is  an  efficient  mechanism 
for  storing  the  necessary  information  for 
aggregate  reporting,  risk  modeling,  or 
auditing  at  a  later  time. 


Please  enter  the  sender's  URL; 

& 

Mia  Analysa  <inia@msp.mass.gov> 

Please  enter  the  data’s  URL.  or  select  from  the  files  below; 

http;//dig.csail.mit.edu/20I0/DM5-fusion/MA/documents/I 

B  ^  Fusion 

Q  Q|  Analysa 
{  B  Q  RFI 

; 

ija|  RBGuyCore 
:  @  BBunny 
•  0  ACapone 

B  Q  Received 

Please  enter  the  redpient's  URL: 

Maury  Copp  <expandmcopp@bpd.fnd.US.gov> 

Please  enter  a  policy’s  URL; 

http7/dig.csail.rTiit.edu/2010/DHS-fusion/MA/ rules /MCL_6 

Disolav  results  usino  the  Tabulator  Firefox  browser  extension. 

This  is  the  recommended  way  to  display  results,  but  can  be  disabled  rf  you  are 
not  able  to  install  Tabulator. 

From;  Mia  Anaivsa  <mia@msp.mass.gov> 

To;  Maury  Coop  <exDandmcoDD@bc>d.md.US.QOv> 

File;  RBGuv 

Policy;  htto://diq. csail.mit.edu/2010yDHS-fusion/MA/rules/MGL  6-172.n3 

Submit ) 


Figure  2:  Web  Interface  for  Transactions 


Visibility  to  Users 


Input;  Transaction  Simulator 

People  act  on  data  using  many  systems  and 
platforms.  Rather  than  separately  model 
transactions  in  email,  various  portals, 
databases,  etc.,  we  created  a  user  interface 
that  is  intended  to  provide  a  view  into  the 
middleware,  allowing  the  user  to  identify  the 
minimal  data  that  would  be  identified  to  the 
accountable  system  regardless  of  application 
or  platform  -  the  sender,  the  target  data,  and 
the  recipient. 

the  UI,  the  sender  and  receiver  are 
identified  by  email  address,  a  commonly 
known  identifier,  and  presumed  to  be  readily 
linked  to  the  URL  for  the  user  profile  (FOAF 
file);  the  individual's  picture  and  URL  are 
automatically  populated  on  the  page.  In 
addition,  choosing  the  data  to  be  sent  causes 
the  UI  to  find  and  auto-populate  the  URL  for 
the  applicable  policy.  If  necessary  to  model  a 
variant,  the  user  can  override  the  policy 
linked  to  the  data  with  a  different  policy. 

Input;  Tabulator  Views 

Many  potential  users  or  evaluators  of  the 
technology  will  not  have  the  skill  to  read 
program  code.  Using  a  semantic  web 
browser.  Tabulator,  to  view  our  input  code 
provides  an  opportunity  for  those  users  to 
glimpse  the  meaning  of  the  native  RDF. 
Tabulator  has  multiple  viewing  panes 
including  a  “FOAF  View”  which  makes  it 
possible  to  see  the  user  profiles  in  a 
visualization  that  looks  more  like  a  list  of 
attributes,  and  a  “Table  View”  which  makes  it 
possible  to  see  an  AIR  policy's  if-then-else 
structure  in  a  nested  chart. 

Output;  Tabulator  Views 

The  accountable  system's  results  can  also  be 
viewed  in  special  Tabulator  panes.  The 
“Justification”  pane  first  opens  to  a  single 
sentence  that  indicates  if  the  proposed 
transaction  is  compliant  or  non-compliant 
with  the  policy.  Pressing  the  “Why?”  button 
provides  the  deep  justification  provided  by 
the  TMS,  which  shows  each  belief  and  facts 
on  which  the  formation  of  those  beliefs 
depended. 

The  “Lawyer”  pane  provides  a  shorter 
form  of  the  analysis  by  generating  a  series  of 
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near-grammatical  sentences  explaining  the 
requirement  of  the  rule,  the  relevant  fact 
instances  that  meet  or  fail  to  meet  that 
pattern,  and  the  citation  for  the  subsection  of 
the  law  being  applied;  the  first  two  are  now 
represented  as  hyperlinks  to  the  URLs  to 
which  they  refer. 


'  http://dice.csail.mit.«du/dhs_air.py?us«_tabulator=tru«&by=http://dice.csail.mit.adu/DHS-fusion 
/MA/profilas/MiaAnalysa#ma&toshnp://dic*.  csail.mit.«du/DHS'fusion/MD/profilaslMauryCopp#mafii 
data=http://dice.csaii.mit.edu/DHS-fusion/MA/docuincnts/Faka  MA  Bequest.  pdftirule5File=http: 

//dica.c»ail.init.«duyDHS-fusion/MA/rulas/MGL 6-172.na  ^  '*5  4  S  Dll  ? 


WhetnerthetrsnsacBoos  comply  with  WGLAnn  6  172 


Rulels)  Is/sre  specified  In  the  collev  file, 


*  Analysis: 


•  Request  for  Infomiatlon  about  Robert  B.  Guv  Is  a  dtsemlnaPon  by  httD:/fdlce.csall.mlt.edu/DHS.fu5ion/MA/orofiles/MlaAnalvsa#meto 
htiD:/;dlce,esail.mlt  eaulDMS.fuslon<MD/crofllevHaurvCooD»nie.  designated  as  Transaction 

•  Request  for  Information  about  Robert  6  Guv  contains  Criminal  Offender  Record  Information,  and  httD://dice.csall.mit.edu/DHS-fuslon 
/MD/DrofllesrWajrvCopoi»me  is  a  member  of  aChmInal  Justice  Agency  as  defined  by  MGL  6€A.l. 

•  Compliance  addltlonallv  requires:  httD://dice.e5al.mn.egu/DHS-fu5lon/MO>Drofiles;MaurvConD«me  Is  actually  oerformina  Criminal 
httD:/ldtce  csaiLmitedu/DHS.fusiorlMD/Droflles/MaurvCQODdme's  Criminal  lusbce  Duties,  as  required  by  MGL  6.172,  Para.  1,  Sent.  2, 

a.  1. 

•  http://dice.csallmit.edu/DHS-fusior/MD/crofllesrMBurvCooDdmelsamemberof  bttD://diocsail.mt.edu;2010/DHS-fuslon/MD/Droflles 
/BaltlmprePollceDeDt  orodrrte  which  Is  certified  by  the  board,  as  required  by  MGL  6.172  Paragraph  2. 

•  Compliance  additionally  requires:  The  agency  to  which  httPi/Zdice  csall  mlt.edu/DHS-fu5lon/MA/DrofileafMiaAnalvsa#me  belongs  shall 
maintain,  for  such  penod  as  the  board  shall  dRermlne.  a  listing  of  the  agencies  or  individuals  to  which  it  has  released  or 
communicated  such  Information,  as  required  by  MGL  6.172,  Para.  4,  Sent.  1. 

•  Inquiry  Is  about  Robert  B.  Guy  and  Is  based  on  a  peisonally  Identifying  characteristic,  as  required  by  MGL  6-172  Para.  5,  Sent.  1,0. 2. 

•  Request  for  Information  about  Robert  B  Guv  Is  a  Criminal  Offender  Record  Information. 

•  httD://dice  csail  mitequ/DHS-fusiQn/MD/orofiles/MaurvCooD»fT>e  performing  function  acrgg  Is  a  member  of  a  Crlmlnel  Justice  Agency 
as  defined  by  MGL6EA.1. 

«  help. //dice  esail.mit  edu/DMS  fuslon/MD/oroflles/HaurvCooDdmelsa  member  of  organization  httD://dla  csail  mlt.eau/2010/DHS- 
fusion/MD/DrofilesIBBltimorePoliceDeocoraeme 

•  Compliance  additionally  requires  that  release  of  Request  for  Ihfonriation  about  Robert  B.  Guv  would  not  violate  any  other  provisionsof 
state  or  federal  law.  as  required  by  MGL  6-172,  Para.  6.  Sent  1(b).  0.3. 


The  transaction  -Transaction  Is  compliant  wIthMGLAnn  6  172 


Figure  3:  Lawyer  View  of  Justification 


Results 

Our  goal  was  to  model  and  execute  six 
scenarios  through  the  reasoner;  we 
accomplished  that  goal  and  built  a  system 
with  sufficient  capability  and  flexibility  that  it 
is  possible  to  run  previously  undefined 
scenarios  (mixing  and  matching  the 
component  pieces  in  unplanned  ways)  and 
also  achieve  correct  results. 

From  the  research  perspective,  this 
exercise  served  primarily  to  confirm 
expectations.  First,  it  demonstrated  some 
notion  of  scalability.  In  our  earlier  work,  ^4  we 
fed  to  the  reasoner  only  the  input  necessary 
to  reach  a  correct  conclusion.  In  this  work, 
we  fed  the  reasoner  a  significant  number  of 
rule  patterns  and  facts  that  were  unnecessary 
to  the  conclusion  and  confirmed,  so  long  as 
the  rules  are  expressed  correctly,  that  the 
correct  result  will  be  produced  -  only  the 
appropriate  sub-rules  will  be  found  to 
support  relevant  beliefs,  and  only  the 
relevant  facts  will  be  reported  as 
dependencies.  As  the  “so  long  as”  clause 
implies,  the  work  showed  the  importance  and 


necessity  of  validation,  i.e.  the  ability  to 
determine  that  the  rules  have  been  expressed 
correctly  in  their  entirety  -  both  the  pattern 
and  its  relationship  to  all  other  patterns  (e.g., 
conditions,  exceptions,  order).  We  also 
proved  that  “broken”  or  undefined  bits  did 
not  necessarily  keep  the  system  from 
reaching  a  conclusion.  For  example,  if  we  run 
a  particular  scenario  under  the 
Massachusetts  criminal  records  release  law 
and  the  recipient  has  a  malformed  tag  which 
was  intended  to  identify  him  as  a  member  of 
a  criminal  justice  agency  but  fails  to  do  so, 
the  system  will  correctly  determine  that  he  is, 
by  a  later  sub-rule  in  the  policy,  entitled  to 
receive  such  information  as  any  member  of 
the  public  may  receive. 

As  part  of  this  research,  we  also 
demonstrated  the  prototype  to  a  variety  of 
relevant  persons  -  ranging  from  Fusion 
Center  analysts  to  Intelligence  Community 
management,  both  technical  and  operational. 
The  reactions  were  very  positive  in  that  such 
an  accountable  system  could  fulfill 
government  obligations  to  ensure  that 
information  sharing  is  handled  in  a  policy 
compliant  manner  and  to  provide  a  level  of 
transparency  to  users. 

The  most  significant  resistance  received 
was  from  an  analyst  supervisor  who 
perceived  this  as  having  the  potential  to  be  a 
management  surveillance  tool  to  question  the 
ability  of  individual  analysts  to  know  and 
comply  with  all  rules;  however,  even  that 
individual  believed  that  the  mechanism 
would  be  quite  helpful  when  necessary  to 
apply  the  rules  of  another  jurisdiction  (i.e., 
not  one's  own)  and  for  use  as  a  workflow 
management  tool.  Conversely,  the  analysts  at 
a  demo  the  next  day  were  so  enthusiastic  that 
they  wanted  to  know  if  they  could  build  and 
use  the  FOAF-based  user  profiles 
immediately. 

Related  Work 


Most  of  the  work  in  this  field  has  focused  on 
building  the  individual  parts  of  a  system, 
rather  than  ensuring  that  all  of  the  pieces  are 
capable  of  operating  together.  Our  group  has 
published  extensively  on  the  importance  of  a 
model  that  is  aware  that  accountability  is 
necessary. 15  There  has  been  work  in 
establishing  the  importance  of  the 
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isomorphism  between  natural-language  law 
and  the  machine-readable  format/®  which 
caused  us  to  consider  very  carefully  whether 
or  not  AIR  is  expressive  enough  for 
representing  real  laws.  In  addition,  there  are 
other  reasoning  languages  similar  to  AIR 
such  as  XACML,^7  and  EPAL,  which  can 
solve  similar  problems  but  without  the  level 
of  justification  granularity  presented  in  this 
work. 

There  has  also  been  work  in  solving 
similar  problems,  but  in  different  domains. 
For  example,  there  exists  work  that 
documents  the  use  of  semantic  web 
technologies  for  policy  management  in  the 
social  web.  in  addition,  there  is  similar 
work  being  done  for  policy  enforcement  in 
federated  environments,  but  their  work 
does  not  seem  to  have  focused  on  the  design 
of  the  reasoner. 

Future  Work 


We  learned  that  the  reasoner  is  relatively  fast, 
for  example,  processing  the  potentially  more 
than  100,000  possible  pattern  match 
combinations  (twenty-seven  facts  about  the 
sender,  twenty-five  about  the  recipient,  six 
about  the  document,  and  thirty-five  rules)  in 
10-60  seconds,  but  that  it  cannot  produce  the 
millisecond  response  necessary  to  use  the 
system  as  a  real-time  processor  for  programs 
that  handle  millions  of  transactions  daily, 
such  as  border  applications  for  customs  and 
screening  passengers.  However,  the  Fusion 
Centers  we  spoke  to  indicated  that  they  were 
producing  sufficiently  small  numbers  of 
analytical  reports  per  day  that  waiting  some 
seconds  for  the  evaluation  would  not  be 
prohibitive.  We  would  like  to  test  other 
reasoning  strategies  to  reduced  the  time  for 
throughput. 

We  also  are  quite  interested  in 
coordinating  with  other  test  components  and 
systems.  Because  the  United  States 
Constitution  establishes  state  sovereignty,  the 
individual  states  do  not  have  to  follow  a 
federal  mandate  on  the  standards  for  an 
accountable  system  and,  as  a  result,  we 
expect  that  for  accountability  to  be  viable 
there  will  always  be  more  than  one  platform, 
policy  language,  and  tagging  scheme  in  effect. 
Research  is  needed  to  determine  the 
feasibility  and  strategies  for  interchange 


among  them. 

And,  we  know  that  in  the  physical  world 
among  humans,  complete  information  is  not 
always  available  and  yet  decisions  must  be 
and  are  made.  We  would  like  to  learn  more 
about  how  incomplete  information  can  be 
effectively  handled  in  an  accountable  system. 

CONCLUSION 


Government  information  sharing  is 
mandated  to  be  performed  “[t]o  the 
maximum  extent  consistent  with  applicable 
law.”  To  date,  efforts  to  implement  that 
mandate  have  been  limited  by  brittle  systems 
that  require  the  system  designers  to 
predetermine  all  likely  permissions  and  then 
hardwire  them.  Here,  we  represented 
complex  policy,  reached  an  array  of 
authoritative  sources  to  implement  that 
complexity,  successfully  reasoned  over  the 
policy,  and  determined  the  correct  result  of 
compliance/non-compliance.  Via  this 
prototype,  we  have  demonstrated  the  initial 
feasibility  of  an  accountable  system, 
narrowing  the  gap  between  the  expectations 
of  law  and  policy  and  the  ability  of  technology 
to  fulfill  them. 
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